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STATE PRICE DENSITY ESTIMATION VIA 
NONPARAMETRIC MIXTURES 1 

By Ming Yuan 

Georgia Institute of Technology 

We consider nonparametric estimation of the state price density 
encapsulated in option prices. Unlike usual density estimation prob- 
lems, we only observe option prices and their corresponding strike 
prices rather than samples from the state price density. We propose 
to model the state price density directly with a nonparametric mix- 
ture and estimate it using least squares. We show that although the 
minimization is taken over an infinitely dimensional function space, 
the minimizer always admits a finite dimensional representation and 
can be computed efficiently. We also prove that the proposed esti- 
mate of the state price density function converges to the truth at a 
"nearly parametric" rate. 

1. Introduction. In this paper we consider estimating the risk-neutral 
distribution encapsulated in option prices. Risk-neutral distributions, often 
characterized by state price densities, recovered from option prices reflect 
investors' expectation toward the future returns of the underlying assets. It 
manifests the preferences and risk aversion of a representative agent [A'it- 
Sahalia and Lo (2000); Jackwerth (2000); Rosenberg and Engle (2002)]. 
Consider, for example, a European call option with maturity date T and 
strike price X. Under the no-arbitrage principle, its price at t can be given 
as 

;>oo 

(1.1) C(X,S t ,r ttT ,T)=e- r ^ T ^(S T )f(S T )dS T , 

Jo 

where r = T — t is the time to maturity, rt. T is the interest rate, i/j(St) = 
max{SV — -V, 0} is the payoff function, and / is the state price density. For 
brevity, we leave implicit the dependence of / on the horizon as well as other 
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economic variables such as the current asset price St, the interest rate and 
the dividend yield over the period. 

The renowned Black-Scholes model assumes that the underlying asset 
price process {St} follows a geometric Brownian motion and, therefore, the 
risk-neutral distribution is a log-normal distribution. Despite its elegance 
and popularity, it is now well understood that the log-normal assumption 
made by the Black-Scholes model can be problematic in practice and may 
result in severe bias of option prices. A number of econometric models have 
been developed to address this issue. Most notable examples include the 
stochastic volatility model and the GARCH model. The readers are referred 
to Garcia, Ghysels and Renault (2009) for a survey of recent developments 
in this direction. Although useful in a variety of contexts, these parametric 
models are still susceptible to model misspecification. 

Various nonparametric methods have been employed to overcome this 
problem. Derman and Kani (1994), Dupire (1994) and Rubinstein (1994) 
propose implied binomial tree techniques to recover the state price density 
from a set of option prices without assuming the log-normality. Buchen 
and Kelly (1996) and Stutzer (1996) reconstruct the state price density 
under the maximum entropy principle. Jackwerth and Rubinstein (1996) 
introduce a smoothness penalized estimate. However, little is known about 
the econometric properties of these methods. 

The state price density estimation is closely related to the recovery of the 
option pricing function C itself. As observed by Banz and Miller (1978) and 
Breeden and Litzenberger (1978), 



Taking advantage of this relationship, the state price density can be derived 
as the second derivative of an estimate of the pricing function C. In the 
presence of pricing error, the estimation of the pricing function C can be 
cast as a regression problem: 



where, with slight abuse of notation, we use C to denote the observed option 
price and C(-) to denote the correct pricing as a function of the strike price, 
and e represents the pricing error. Various nonparametric regression tech- 
niques have been applied to estimate C(-). In one of the pioneering papers, 
Hutchinson, Lo and Poggio (1994) consider estimating C nonparametrically 
using various learning networks. More recently, A'it-Sahalia and Lo (1998) 
introduce a semiparametric alternative where the volatility of the Black- 
Scholes formulation is modeled nonparametrically. The readers are referred 



(1.2) 




(1.3) 



C = C{X)+e, 
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to Ghysels et al. (1997) and Fan (2005) for recent reviews of other nonpara- 
metric methods for estimating the option pricing function or the state price 
density. 

From a statistical point of view, estimating the state price density now 
becomes estimating the second derivative of a regression function. But un- 
like other regression problems, the state price density needs to be a proper 
density function that is non-negative and integrates to the unity. This dic- 
tates, for example, that the price function C(-) is monotonically decreasing 
and convex in terms of the strike price X. More precisely, 

-e rt ^ T < C'{X) < 0, 

C"{X) > 0. 

How to impose these constraints presents the main difficulties in nonpara- 
metric regression (1.3). Ai't-Sahalia and Duarte (2003) and Yatchew and 
Hardle (2005) stress the importance of enforcing such shape constraints in 
estimating the option pricing function and propose a nonparametric estimate 
of C that respects these constraints. It is shown that both approaches lead 
to improved accuracy in recovering the pricing function and can guarantee 
the non-negativity of the state price density estimate. Neither state price 
density estimate, however, is guaranteed to integrate to one as required by 
a proper density. A post-estimate normalization is only necessary to ensure 
such constraint. 

In this paper we develop a new approach to nonparametric estimation 
of the state price density. We propose to estimate the regression functions 
by minimizing the (weighted) least squares over a set of admissible pric- 
ing functions. The admissible pricing function is deducted directly from 
the very existence of a state price density. We consider a particular ad- 
missible set of pricing functions whose corresponding state price density is 
a nonparametric mixture of log-normals. We show that even though the 
minimization is taken over a infinite dimensional space, the minimizer ac- 
tually admits a finite dimensional representation. In particular, all solutions 
can be expressed as a convex combination of at most n + 1 Black-Schole 
type of pricing functions. In addition, we prove that, by focusing on the 
set of admissible pricing functions, not only the estimated state price den- 
sity can be ensured to be a legitimate density function, but also the esti- 
mation accuracy can be drastically improved. More specifically, we show 
that as the sample size n increases, the pricing function can be recov- 
ered with squared error converging to zero at the rate of In 2 n/n, which 
is very close to the 1/n convergence rate that is typically achieved only with 
much more restrictive parametric assumptions such as the log-normality. 
Further, we show that integrated squared error of the estimate of the state 
price density converges to zero at the rate of In 4 n/n, which again differs 
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from the usual parametric rate only by a factor of the power of log sample 
size. 

The rest of the paper is organized as follows. We describe the methodol- 
ogy in the next section. Section 3 discusses the asymptotic properties of the 
proposed estimate of both the call option prices and the state price density. 
The proposed estimating scheme is illustrated through an empirical study 
in Section 4. We close with some conclusions in the last section. All proofs 
are relegated to the Appendix. 

2. Method. In the Black-Scholes paradigm, the state price density / 
corresponds to a log-normal distribution. More precisely, the log return 
ln(SV/<St) follows a normal distribution with mean (rt >T — 5t, T — u 2 /2)t and 
variance <t 2 t, where 8t )T is the dividend yield in this period. Under this 
premise, (1.1) yields 



where $(•) is the cumulative distribution function of the standard normal 
distribution and 



The Black-Scholes formula (2.1) prices a European call option with only 
one parameter, a, often referred to as the implied volatility. The Black- 
Schole model works remarkably well in the early years of option markets. 
It, however, becomes increasingly conspicuous that it fails to explain the 
option prices observed in the post-1987 crash market [Rubinstein (1994)]. 
To illustrate, in Figure 1, we plot a cross section of S&P 500 index option 
prices versus the strike price during a three week span in December 2002. 
The options expired on March 2003. We shall explain the main data char- 
acteristics in more detail in Section 4. Along with the observed data, we 
also plot the best fit given by the Black-Scholes model. It can be observed 
that the Black-Scholes model tends to underprice the deep in-the-money 
options. The discrepancy can be as much as 25% or about $20, which is 
rather significant. 

Various approaches have been developed to improve the original Black- 
Scholes model. In the Black-Scholes paradigm, the underlying asset price is 
assumed to follow a geometric Brownian motion: 



(2.1) 



C(X,S t ,r t , T ,T) = S t e 



$(di) - Xe" rt ^ T ^(d 2 ) 




(2.2) 



dS = r]Sdt + aSdB 



where rj is the growth rate of the stock price and Bt is a standard Brownian 
motion. One popular alternative to the original Black-Scholes model is the 
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Fig. 1. S&P 500 index option prices together with the best Black-Schole model fit. 
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stochastic volatility model where V = a 2 is further modeled as a stochastic 
process: 

(2.3) dV = (V dt + £V dW, 

where W is another standard Brownian motion. In the risk-neutral world, 
the asset price and its instantaneous variance a 2 follow similar processes: 

dS = (r-5)Sdt + aSdB, 

da 2 = aa 2 dt + ia 2 dW . 

Hull and White (1987) showed that if Bt and Wt are two independent Brow- 
nian motions, then the conditional distribution of \q.{St/ St) given "average" 
volatility 

(2.4) v=^- t j\ 2 {u)du 

is normal with mean (r^ r — 6t, T — V/2)t and variance Vt. More generally, 
using the same argument as Hull and White (1987), it can be shown that the 
statement holds true as long as Bt is independent of the volatility process 
V{t). 

In other words, under the stochastic volatility model, \ti(St/ St) follows a 
mixture of normal distribution 

(2.5) f(HS T /S t )) = [ <P(HS T /St)\(rt, T ~ St, T - V/2)t, Vt) dF(V), 



where </>(-|/i, c 2 ) is the normal density function with parameters \x and a 2 
and F is the distribution function of V. Clearly, this reduces to the Black- 
Scholes model when F is a degenerated distribution. Motivated by this and 
to allow for more flexibility, we consider in this paper state price densities 
such that In St follows a nonparametric mixture of normal densities: 

(2.6) h(lnS T ) = I (t)(lnS T \n,a 2 )dG{fi,a), 



where G, referred to as mixing distribution, is an unknown bivariate distri- 
bution function. The corresponding state price density can be written as 

(2.7) f(S T )= Jv(S T \^a 2 )dG(fi,a), 

where v(-\fi,a 2 ) is the density function of log normal distribution with loca- 
tion parameter \i and scale parameter a. It is evident that when G assigns 
probability one to (hi(Sf) + (r f;T — 5 t ,T — cj2 /2)t, cry/r), the proposed mixture 
model reduces to the Black-Scholes model. The stochastic volatility model 
described above is also the special case of our nonparametric mixture model. 
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Different from the Black-Scholes models, our model for the state price den- 
sity is nonparametric in that we do not impose any parametric assumption 
to the mixing distribution G. Mixtures of form (2.6) are known to be a 
rich family of distributions and can approximate any differentiable density 
function to an arbitrary precision [Silverman (1986)]. 

Our goal is to extract the state price density function / as well as the 
pricing function C given a set of observations on strike price and option price 
pairs, (Xi, C±), (X 2 , C 2 ), . . . , (X n ,C n ), that follow a regression relationship 

(2.8) C i = C(X l )+e i , i = l,2,...,n. 

In particular, we assume that the state price density function lies in the 
following class: 

T ={/(■): ■f(S)= f v(S\»,a 2 )dG(fi,a), 

(2-9) 

supp(G) C [-M,M] x [a,a}\, 



for some constants M < 00 and < a < a < 00. Correspondingly, the pricing 
function belongs to the following function class: 



(2.10) C = l [ C(-):C(X) = e- r ^ J 4>(S)f(S) 



dSJeF 



Following Ait-Sahalia and Duarte (2003), we consider estimating the pricing 
function by minimizing the weighted least squares: 

1 n 

(2.11) C(-) = arg min - V w^d - C(Xi)) 2 . 



C(-)eC n 



=1 



As argued by A'it-Sahalia and Duarte (2003), the weights w[s can be cho- 
sen to reflect the relative liquidity of different options. More actively traded 
options would receive a higher weight than those less actively traded ones. 
They also suggest that the actual weights be determined on the basis of the 
size and time of the most recent transaction and the bid-ask spread, which 
are readily available in practice. For brevity, in the following discussion, we 
shall assume equal weights w\ = w 2 = ■ ■ ■ = w n = 1. Our results, however, 
also apply to the more general and realistic weighting schemes. 

Note that when the state price density is given by (2.7), the pricing func- 
tion is also determined by the mixing distribution G: 

C(X; G) = e" rt - r / ^(S T )f(S T ) dS T 

= e~ n ' TT J ip(S T ) J v(S T \n,<r 2 )dG{n,a) dS T 
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iP(St)v(St\li, c 2 ) dSx 
= J C(X;fi,a 2 )dG(fi,a), 



dG(fj,, a) 



where 

,2\ -Vt.rT I " ././O \../'o_| .. 2 



C(X;n,<r ) = e~ n ' TT / Tl)(S T )v(S T \n,(r)dST 
Jo 

/oo 
(e s -X) +( /.(s|/i,cj 2 )(is 
-oo 



e -rt,rT 



oo 

OO pOO 



e rt - TT / e s (j) a (s - n)ds - X / (f) a (s-fi)ds 

\J\nX JlnX 
'lnJf-// N 



(7 



e -rt,r 



and $(•) = 1 - $(•). 

The least squares estimate of the pricing function can be equivalently 
written as C(-) = C(-;G) with 

1 n 

(2.12) G(-)=argmin-^(C J -C(X J ;G)) 2 , 

where is the collection of all probability measures on /x and a 2 . Note 
that the minimization is taken over a function space of infinite dimension, 
which is not directly computable at the first glance. However, as the following 
theorem shows, the solution can always be represented in a finite dimensional 
space and therefore make the minimization possible. 

Theorem 2.1. The minimum of (2.12) exists and there is a distribution 
whose support contains no more than n + 1 points achieves the minimum. 
Furthermore, at each support point, a 2 = cr 2 . 

Theorem 2.1 is of great practical importance since it now suffices to find a 
minimizer of (2.12) that has a support of n + 1 points or fewer, which can be 
solved numerically. The theorem is similar to theorems for optimal design 
[Silvey (1980)] and the the famous result for mixture likelihood [Lindsay 
(1983)]. 
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In practice, it is common to ensure that the expected value of the price 
of the underlying security under the risk neutral measure is equal to the 
forward price of the underlying. This constraint can be easily incorporated 
in our framework. Note that when the state price density / comes from T, 
the expected value of St can be conveniently expressed as 

(2.13) E f (S) = J e^ +a2 ' 2 dG{n,a 2 ). 

Denote by F^t the forward price of the underlying security. Enforcing the 
aforementioned constraint means that instead of we restrict our attention 
to the following family of densities: 

(2.14) J-* = {/(.):/e^,/ e^^ 2 dG(^a 2 ) = F ttT y 

Note that J 7 * is a convex subset of T. Theorem 2.1 remains true. For the 
same reason, in the subsequent theoretical development, we shall neglect 
such constraint for brevity. But it is noteworthy that all our discussion also 
applies to the situation when this constraint is in place with little notational 
changes. 

3. Theoretical properties. Before stating the main theoretical results, 
we first describe a set of conditions for the pricing errors. Assume that the 
pricing errors are independent and satisfy the following: 

(a) E(ei) = 0, for i = 1, . . . , n; 

(b) for some j3 > 0, V > 0, 

(3.1) sup max £(exp(/?e 4 2 )) < T < +oo. 

n l<i<n 

Both conditions are rather mild. Condition (a) indicates that the observed 
price is unbiased, which provides the basis for estimating the pricing func- 
tion. Condition (b) concerns how fast the tail of error distributions decays. 
Distributions that satisfy Condition (b) are often called sub-Gaussian. When 
the pricing errors follow normal distributions, this condition is satisfied. In 
the most realistic situations, the pricing error is bounded and this condition 
is also trivially satisfied. 

We now consider the property of the estimated price function. 

Theorem 3.1. Let C n be the minimizer of (2.11) over T . Then under 
Conditions (a) and (b) there exist constants Lq,Cq > such that, for any 
u>uq and L > Lq, 

(3.2) P {^ dn ~ > L ) ~ n ~ C ° L2 > 
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where 

1 n 

(3.3) \\C n - C\\ 2 n = - Y,{C(Xi) n - C{Xi)} 2 . 
Consequently, 

0n - C\\n = O p (~~~^ ■ 

The convergence rate obtained in Theorem 3.1 is to be compared with the 
usual parametric situation such as the Black-Scholes model. When assuming 
that the state price density follows a log-normal distribution, the pricing 
function can be given as (2.1). The volatility can also be estimated, for 
example, by means of the least squares. Such procedure would lead to the 
usual parametric convergence rate that \\C n — C||„ = O p (l/n). Note that, 
assuming the state price density resides in a much more general family T ", 
the convergence rate we obtained here only differs from the parametric rate 
by In 2 n. 

Next, we study the properties of the estimated state price densities. 

Theorem 3.2. Denote px the sampling density of strike prices X±, . . . ,X, 
Let be an open set such that 

(3.4) minp x (x)>L >0 

x&l 

for some constant Lq. Then under Conditions (a) and (b) 

(3.5) J^-jr-o,^). 

Similar to the price function, the state price density estimate converges at 
a "nearly" parametric rate, now with an extra term In 4 n. Compared with the 
price function, the rate is slightly slower. It is typical in nonparametric statis- 
tics that differentiation results in slower convergence rate. Different from 
the usual nonparametric setting, however, the convergence rate deteriorates 
only by ln 2 n. In contrast, for both approaches from A'it-Sahalia and Duarte 
(2003) and Yatchew and Hardle (2005), the price function can be estimated 
at convergence rate n~ 2q ^ 1+2q ^ and state price density at n~ 2 ( q ~ 2 ^ ( 1+2q ^ , 
both in the integrated squared error sense when assuming that the price 
function is q times differentiable. 

In characterizing the performance of the state price density, confining to 
a set such as Q is often necessary. The state price density is estimated based 
on observed pairs of strike and call price. Because we rarely observe strikes 
from regions where px is close to zero, it is impossible to estimate it well 
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in these regions without further restrictions. In practice, this is irrelevant 
because the strikes are most often evenly distributed in a compact region 
around the asset price. Putting it in our notation, this amounts to px being 
a uniform distribution and set O of Theorem 3.2 can be taken as the whole 
support of the distribution. 

4. Numerical studies. 

4.1. Implementation. Theorem 2.1 shows that, without loss of generality, 
the estimated state price density admits the following expression: 

fQn(S T /S t )) =7T 1( f ) (HS T /S t y, f i 1 ,a 2 ) + ir 2( f>(ln(S T / S t ); p 2 ,a 2 ) 

H \--ir n+1 (f)(hi(ST/St);iJ, n+ i,g: 2 ), 

and it suffices to estimate the mixing proportions ttjs and means (AjS in 
minimizing the least squares. We propose to iteratively compute the mixing 
proportions and the means for a given a 2 . Given fit,. .. ,/i n +i, updating the 
mixing proportions can then be cast as a quadratic program and easily solved 
using the standard quadratic program solvers. Once the mixing proportions 
are available, we update the means by Newton Ralphson iterations. 

The constraint that the expected value of the price of the underlying 
security under the state price density is equal to the forward price can also be 
easily incorporated in this algorithm, as it can now be conveniently expressed 
as 

(4.1) F tjT = S t e^/ 2 (7T 1 e^ + ■ ■ ■ + ir n+1 e^ +1 ). 

It is of great importance to choose a good initial value. A careful exami- 
nation of the proofs to Theorems 3.1 and 3.2 suggests that any density from 
T can be approximated well by a member of T but with the means fj, to 
be equally spaced between [—M,M]. Motivated by this fact, we can take 
the means to be equally spaced as the initial value. The algorithm there- 
fore starts with a natural initial solution, which is already a good estimate. 
A limited number of iterations are usually sufficient to achieve good per- 
formance in practical applications. We observe empirically that the least 
squares objective function decreases quickly in the first iteration, and the 
objective function after the first iteration is already very close to the objec- 
tive function at convergence, as the magnitude of the decrease in the first 
iteration dominates the decreases in subsequent iterations. This motivates 
us to use a one-step iteration in our implementation. 

4.2. Simulation. To gain insights to the finite sample performance of the 
proposed method, we first conducted a set of simulation studies. We adopted 
the experiment setting of Ai't-Sahalia and Duarte (2003), which was designed 
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Call Price Function State Price Density 
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strike log-return 



Fig. 2. Estimated call option price function and state price density summarized over 
5000 simulation runs. 

to mimic S&P 500 index options. In particular, the current index price was 
set at 1365, the short term interest rate at 4.5%, the dividend yield at 2.5%, 
and the time to maturity at 30 days. The volatility smile was assumed to be 
a linear function of the strike with volatility equal to 40% at the strike price 
1000 and 20% at the strike price 1700. We assume that we observe n = 25 
option prices with strike prices equally spaced between 1000 and 1700. The 
option prices were simulated by adding uniformly distributed random noise 
to the theoretical option prices. Following A'it-Sahalia and Duarte (2003), 
the range of the noise varies linearly from 3% of the option value for deep 
in the money options to 18% for deep out of the money options. 

For each run, the call function and the corresponding state price density 
are estimated by the proposed method with a chosen by leave one out cross 
validation [Wahba (1990)]. Figure 2 displays the average estimates and 95% 
pointwise confidence intervals for the call price function and the state price 
density based on 5000 simulations. It is evident that the estimate works very 
well. 

4.3. Real data analysis. To illustrate the proposed methodology, we now 
go back to the historical option data briefly mentioned in Section 2. The data 



STATE PRICE DENSITY ESTIMATION 13 
S&P 500 Index Price Eurodollar Deposit Rates 




consist of a cross section of European call option prices written on the S&P 
500 index during the first three weeks of December, 2002. Figure 3 shows the 
closing price of the index itself and the Eurodollar deposit rates (London) 
in the same period. The deposit rate is used as the risk-free interest rate. 
Because the maturity ranges from 3 months to about 4 months, we linearly 
interpolated the 3 month rate and 6 month rate to yield the daily risk-free 
rate. 

The cross-section of the option prices are given in the leftmost panel of 
Figure 4. Following convention, we use the average of the end-of-day bid and 
ask price as the option price. Different lines correspond to different dates. 
It is clear that the option price can be modeled as a smooth function of the 
strike. This leads to the misperception that we can always estimate the state 
price density by directly differentiating an interpolation of the options prices. 
Such naive strategy does not work in practice, however. To elaborate, the 
middle and right panels of Figure 4 show dC/dX and d 2 C/dX 2 respectively 
estimated by straightforward differentiation. It can be observed that the 
derivatives are much more wiggly as functions of the strike and, furthermore, 
there is no guarantee that the resulting estimate of the state price density 
is positive as required by a legitimate density. 
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Fig. 4. Historical European call option prices for S&P 500 index from December 2, 
2002 to December 20, 2002 and its derivatives with respect to the strike. Different lines 
correspond to different dates. 



We now apply the proposed method to the option prices on a daily basis. 
As in Ai't-Sahalia and Duarte (2003), we set the weights to be the inverse of 
the option price since the prices fluctuate considerably more when the price 
itself is high. The Black-Scholes model fit reported in Section 2 is produced 
in the same fashion. We also reconstruct the dividend rate through the put- 
call parity: 

(4.2) p t + Ste - 5T = C t + Xe- rT 

using the put-call pair at the money. We choose a using leave one out cross 
validation. Our experience suggests, however, usually three quarters of the 
volatility obtained from the Back-Scholes model fit works fairly well in prac- 
tice. The estimated pricing functions and state price densities are given in 
Figures 5 and 6 respectively. In contrast to the Black-Schole model fit shown 
in Figure 1, our nonparametric estimate fits the historical option prices very 
well. The departure of the underlying state price densities from log normality 
is also evident from Figure 6. 

These nonparametric state price density estimates can have many uses. 
For example, as pointed out in Ai't-Sahalia and Duarte (2003), they can 
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be employed to price new and more complex or less liquid options in an 
arbitrary-free fashion. With the knowledge of the entire state price density 
available, it is also straightforward to derive interesting quantities such as 
value-at-risk. Our nonparametric estimate can provide more reliable infor- 
mation than its parametric counterpart from these perspectives. For exam- 
ple, as evidenced in Figure 6, the Black-Scholes paradigm may significantly 
under-evaluate investment risk. 

5. Conclusions. In this paper we introduced a new nonparametric option 
pricing technique. We consider state price densities that can be represented 
as a nonparametric mixture of log normals. Our nonparametric model is 
inspired by the stochastic volatility model of Hull and White (1987) and 
extends the original Black-Scholes model. Both the option price function 
and the state price density can be estimated through the least squares. We 
showed that such estimates enjoy nice asymptotic properties. An applica- 
tion to the historical data also demonstrates the merits of the proposed 
methodology in finite samples. 

APPENDIX 

Proof of Theorem 2.1. The existence of a minimum comes from the 
fact that both the objective function and the feasible region are convex. 
Denote 

(5.1) A = {C(-;n,a 2 ): fi£R}. 

It is not hard to see that C is a subset of the convex hull of A. Let G 
be a minimizer of (2.12). Clearly, (C(X 1 ,G),C(X 2 ,G), . . . ,C(X n ,G))' is 
an element of the convex hull spanned by A. By Caratheodory's theorem, 
there exits a subset B of A consisting of n + 1 points or fewer so that 
(C(X 1 ,G),C(X 2 ,G), . . .,C(X n ,G))' is in the convex hull spanned by B. In 
other words, we can find fj,±, fj, 2 , ■ ■ ■ , fJ-n+l £ R so that 

n+1 

(5.2) C{X il G) = Y J K j C{X i ;ii j ,a 2 ) Vi = l,...,n 

i=i 

for some ttj > and tt\ + ■ ■ ■ + 7r n+ i = 1. Therefore, 

n+1 

(5.3) G(-) = J2^M-^j^ 2 ) 

i=i 

minimizes (2.12). □ 

Proof of Theorem 3.1. Without loss of generality, we assume that 
rt tT is zero throughout the proof. 
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For e > 0, the e-covering number of C, jV(e,C, || • ||oo), is denned as the 
number of balls with radius e necessary to cover C. Denote 



(5.4) 



n(s,e, 



>) =lnJ\f(e,C,\\ • ||oo) 



the e-entropy of C. In the light of Theorem 4.1 from van de Geer (1990), it 
suffices to show that, for any 5 > small enough, 



(5.5) 



[ S H 1/2 (u,C,\\ ■ W^duKCSln-r. 
Jo 



We now set out to establish this inequality. We proceed by explicitly con- 
structing an e-covering set for C. 

An application of the Taylor expansion yields 



(5.6) 4>{u) - 
For any U > \u\ , 



1 



fe-l 

E 

3=0 



V 



u 



(z) dz + 



(z) dz + 



< 



v 



V 



1 u 



2k 



< 



1 



2tt 2 k k\ ~ ^\2k 



eu 



2\k 



(z) dz 



fe-l 

E 

j=0 



dz 



+ 



v 



<j>{u) 



g(_l)i z 2i- 



3=0 



Therefore 
$(u) - 



fe-l 

E 

3=0 



-1)3 z 2 ^' 



dz 



< 



dz. 



v 



(j)(z) dz + 
<C<P(U)/U + 



1 



ez 



2 \ k 



dz 



jj2k+l _ u 2k+l / g \ fe 

2k 



2tt(2/c + 1) 

eU 2 \ k+1 * 
~2k 



Hereafter, we use C > as a generic constant. 

Now consider distribution functions F and G such that 



(5.7) 



u j dF(u) = / u j dG(u), 



I,. ..,2k- 1. 
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Then, 

C(x; n,a) dF([i) — I C(x; fi,a) dG(fi) 



C(x W ,a)d{F(fx)-G^)} 



< 



d{F^)-G(^)} 



+ 

< e v 2 /2+M 

+ e a 

< e v 2 /2+M 



a 



fc-i 

E 



(-i)j^" 



(x-i*-o*)/*\V27C ^3 



dz\d{F^)-G^)} 



V 



E ^777- I dz ] <i{F(^) - G( M )} 



e[/ 2xfc+l 



+ (e - 2 /2+M +e x ) C| 0([/) + 



Choosing [/ = y/k/2 yields 



2/,' 



2A- 



(5.8) 



C{x\n,a) dF(n) — I C(x; /jL,a) dG(fi) 



<C(l + r)~ k /Vk 



for some < r < exp(l/2<r 2 ) — 1 and all x < y/k/2. 
On the other hand, when rr > y/k/2, 

J C(x; n, a) dFfjj.) < e^' 2+M J § f * ~ (/ "_ + ^ ) dF(^) 



(7 



x-M -a' 



a 



x-M-a 2 \ ifx-M-a 



a 



< Ce° 2 / 2+M e- k ^ 2 /Vk 
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< C(l + r)- k /Vk. 
Subsequently, for any x > y/k/2, 

C(x; fj,,a) dF{ji) — \ C(x; fj,,a) dG(fi) 



< J C(x;ii,<r)dF(n) + J C(x;n,o)dG(ji) 

<C{l + r)- k /Vk. 
In summary, we conclude that 



(5.9) 



C{--ii,a)dF{ii)- C(-,(j,,a)dG(v) 



<C{l + rY h /Vk. 



Lemma A.l of Ghosal and van de Vaart (2001) shows that, for any dis- 
tribution function F, there exists a probability measure G supported on 
2k points so that (5.7) is satisfied. In other words, we can always find a 
probability measure G with support on at most 0(ln(l/e)) points such that 



(5.10) J C(-;n, a) dF(fi) - J C(-;/j,, a) dG{n) 

Now note that, for e(> 0) small enough, 
\C(X;fi,a)-C(X;» + £,a)\ 

riaX 



< e/2. 



+00 



InX 



{{e s_ X ) + -(e s - 

e s )d<f>(s;(i,a 2 ) + (e s+e - X) d<S>(s; a 2 ) 

JlnX-e 
lllX- (/i + CT 2 )' 



s+e 



1 e 



<r 2 /2+fi 



1 - $ 



InX - (/x + cr 



XI $ 



InX-fi 



<I> 



InX -(fi + e) 



InX - (/i + cr 2 + e) 



a 



< (e £ - l)e ff2 / 2+ ^ + ee a2 / 2+ ^ +£ /V2^ 



<{(2 + l/a)e 



a 2 /2+M 



}e = Ke. 



In conclusion, for any F £ we can always find a G who is supported only 
on 0, ±e/K, ±2e/K, . . . , ±[KM)e/K such that 



(5.11) 



C(-,f,,a)dF(fi)- / C(- W ,a)dG(fi) 
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Together with the fact that there exists a generic constant D > such that 

(5.12) \C(X;n,a)-C(X;fM,a + £)\<Ds, 

this implies the following bound on the covering number for C: 

'ljCln(l/e) 

Therefore, the entropy can be bounded as well: 



(5.13) AT( £ ,C,|| • ||oo)<C 



(5.14) H(e,C, || • ||oo) = bxAf(e,C, || • IU) < C In 



!N2 



Hence, for <5 small enough, 

(5.15) ( & H ll2 (u,C,\\ ■ ||oo)^< C8\n\. 
Jo o 

The proof is now completed. □ 

Proof of Theorem 3.2. Denote h = C n — C. From Theorem 3.1, 

(5.16) [h*<±[ h 2 p x <± [h 2 p x = 0,/ 



In L Jn L J \ n 

It is not hard to see that h" = f n — f . By Parseval's theorem, 

(5.17) J (h^) 2 = J \\F{h lk] }\\ 2 , 

where J-{h^} is the continuous Frourier transform of hJ®. Furthermore, 
note that 

n+l a 2 s 2 



(5.18) Hh [k] }{s) = s k - 2 F{h"}(s) = J2^jS k ~ 2 exp[-ifijS 
By the triangular inequality, 

(5.19) ||^}( S )||<E^l s l fc " 2ex p(-V) <l s | fc " 2ex p(-^f )• 
Together with (5.17), we have 

(5.20) 



< J s 2 ^exp(-a 2 s 2 )ds 

tT (2fc-4)! , _ 2 ,_ (jfe _ 2) 



a 2 2 k - 2 (k-2)\ 
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An application of the Kolmogorov interpolation inequality yields 

/ (fn -ff=l (h"f 

Jn Jn 

<AM~*'XJ^T 

( (\n 2 n\ 1 - 2 l k ( k \ 2 \ 

The proof can now be completed by setting k = Inn. □ 
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